[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

mitruska · 2025-09-30T09:33:39Z

Details:

Specification of MOE internal operation
Internal ops are used mainly for fusion transformations and optimizations,
they will not appear in the converted model public IR

Describes MOE used in PR:

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

Tickets:

171911

…rnal_spec

mitruska · 2025-10-01T13:18:26Z

...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst

+    # Experts computation part (GEMM3_SWIGLU)
+    x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)
+    x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)
+    swiglu = swish(x_proj, beta=expert_beta)
+    x_proj = x_proj2 * swiglu
+    down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)


GPU plugin request is to transpose those weights at conversion stage, so the MatMul both transpose_a/b attrs should be False at this point:

Suggested change

# Experts computation part (GEMM3_SWIGLU)

x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)

x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)

swiglu = swish(x_proj, beta=expert_beta)

x_proj = x_proj2 * swiglu

down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)

# Experts computation part (GEMM3_SWIGLU)

x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=False)

x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=False)

swiglu = swish(x_proj, beta=expert_beta)

x_proj = x_proj2 * swiglu

down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=False)

cc: @yeonbok

…ts/operation-specs/internal/moe.rst

rkazants · 2025-10-02T06:02:28Z

...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst

@@ -0,0 +1,151 @@
+.. {#openvino_docs_ops_internal_MOE}
+
+MOE


let us not use MoE name because we can use it for external operation and for real MoE operation. Now it is a sort of FusedExperts.

The routing weights and indices are provided as inputs, so the core MOE idea is preserved, final multiplication and ReduceSum are included.
I would keep the name as is, to make current purpose clear.
The MOE internal op can be refactored as needed in the future, also possibly extended with Router.

rkazants · 2025-10-02T06:06:14Z

...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst

+.. code-block:: py
+    :force:
+
+    # Common part: Reshape hidden states and prepare for expert computation


I propose to add router_topk_output_indices into this logic. It will show how weights are prepared. Now it is not clear how router_topk_output_indices is used in the specified operation.

rkazants

Good job! Thank you, Kasia. Left a couple of comments,

...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst

Co-authored-by: Tatiana Savina <[email protected]>

… experts into MOE (#32183) ### Details: This transformation is for compile time and is not enabled by default, it should be enabled in each plugin with MOE plugin support. Example registration of the fusion transformation for CPU plugin: 41145cf - Fuse vectorized MatMul experts into MOE for 3GEMMs and 2GEMMs pattern: ``` class ov::pass::VectorizedExpertsFusion : public ov::pass::GraphRewrite { public: OPENVINO_GRAPH_REWRITE_RTTI("VectorizedExpertsFusion"); VectorizedExpertsFusion() { add_matcher<ov::pass::FuseVectorizedMOE2GEMM>(); add_matcher<ov::pass::FuseVectorizedMOE3GEMM>(); } }; ``` - Add internal MOE op MOE internal op spec PR: - #32255 ## Preliminary requirements (offline transformations): - Patterns match MatMul (transpose_a=False, transpose_b=**True**), for batched MatMuls preliminary update of MatMulConstTransposesExtraction is needed: - #32378 - Fusion of separate MatMul experts into vectorized (batched) MatMul: - #32199 ### Tickets: - transformation (and fusion details): 173663, op: 171913

mitruska added 4 commits September 29, 2025 20:56

Internal MOE spec init

1fc8f7b

Merge remote-tracking branch 'upstream/master' into mitruska/moe_inte…

9a66628

…rnal_spec

Minor spelling refactor

95359d3

Switch beta with alpha to match the beta for swish naming

1bb24d8

mitruska requested a review from a team as a code owner September 30, 2025 09:33

mitruska requested review from zKulesza and removed request for a team September 30, 2025 09:33

github-actions bot added the category: docs OpenVINO documentation label Sep 30, 2025

ValentinaKats requested a review from tsavina September 30, 2025 09:54

mitruska requested review from maxnick, mmikolajcz, riverlijunjie, rkazants and yeonbok September 30, 2025 10:04

mitruska mentioned this pull request Sep 30, 2025

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

Merged

mitruska self-assigned this Sep 30, 2025

mitruska added 3 commits September 30, 2025 13:32

Refactor formatting

e5c0009

Update identation

8a9e4d1

Fix x_proj -> x_proj2 in GEMM3 mode in moe.rst

bce1465

mitruska commented Oct 1, 2025

View reviewed changes

Update docs/articles_en/documentation/openvino-ir-format/operation-se…

b9b12ff

…ts/operation-specs/internal/moe.rst

rkazants reviewed Oct 2, 2025

View reviewed changes

tsavina reviewed Oct 2, 2025

View reviewed changes

mitruska and others added 2 commits October 3, 2025 09:37

Apply suggestions from code review

4cb21b4

Co-authored-by: Tatiana Savina <[email protected]>

Update routing weights shape description moe.rst

6609269

Update all MatMuls to have transpose_b=True in moe.rst

e18f29d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

mitruska commented Sep 30, 2025 •

edited

Loading

Uh oh!

mitruska Oct 1, 2025

Uh oh!

rkazants Oct 2, 2025

Uh oh!

mitruska Oct 8, 2025

Uh oh!

rkazants Oct 2, 2025

Uh oh!

rkazants left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

Are you sure you want to change the base?

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

Conversation

mitruska commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

mitruska Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

rkazants Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

mitruska Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

rkazants Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

rkazants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mitruska commented Sep 30, 2025 •

edited

Loading